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Tempered Fermions in the Hybrid Monte Carlo Algorithm 
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Parallel tempering simulates at many quark masses simultaneously, by changing the mass during the simulation 
while remaining in equilibrium. The algorithm is faster than pure HMC if more than one mass is needed, and 
works better the smaller the smallest mass is. 



1. INTRODUCTION 

The standard algorithms used for full QCD are 
painfully slow. Physically large objects, like in- 
stantons, pions etc., may well decorrelate much 
more slowly than, say, the plaquette 

Tempered algorithms promote a param- 

eter of the theory to a dynamical variable that 
changes during the simulation, which has tremen- 
dous potential for speeding up slow simulations. 
They have been successfully implemented in /3 for 
spin glasses, U(l) etc. For QCD at zero tempera- 
ture promoting f3 does not help (although at high 
temperature, around the chiral phase transition 
it probably will), but promoting the quark mass, 
and allowing it to change during the simulation, 
does speed things up. 

The mass is only changed if the configuration 
is simultaneously in the equilibrium distribution 
of both masses. So tempering is always in equi- 
librium and requires no re-weighting etc. after- 
wards. 

The minimum gain comes from running at 
heavier (faster) masses between independent con- 
figurations. The maximum gain comes if the 
relevant auto-correlations are smaller for larger 
masses. 
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2. TEMPERING 

The quark mass, i 
Wilson fermions, becomes a dynamic variable, 
and may take a different value for each trajec- 
tory of, say, the hybrid Monte Carlo algorithm [Q . 
(Regard m below as the corresponding k for Wil- 
son fermions.) The masses used belong to an 
ordered set with Nm elements, [mmin, m,„ax]- 
The only requirement is that the action his- 
tograms of neighbouring masses overlap. 

There are two types of tempering, simulated 
and parallel tempering. The idea behind simu- 
lated tempering in QCD, investigated in is 
very simple, and as it is the basis for the parallel 
tempering investigated here it will be described 
first. 

In simulated tempering you add to the prob- 
ability distribution a constant gi for each mass 
mi, which indicates roughly where the half-way 
point between the action histograms of mass rrii 
and mi-|_i is. The original QCD probability dis- 
tribution P{U, (p) now becomes 

P{U,(j),i) oc exp[-5([/, 0,/3,TOi) + gi\. 

This distribution is simulated using your 
favourite algorithm for fixed quark mass (eg., 
HMC here), combined with Metropolis steps to 
change from rrii to mi±i. The constants gi are 
only to enable the masses to change both up and 
down, and do not affect the physics. 

The hybrid Monte Carlo algorithm insures 
that the correct Gibbs distribution is generated 
at each value of the mass, and the temper- 
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ing Metropolis step insures that the mass only 
changes if the configuration is part of the equilib- 
rium distribution of both masses. 

The constants gi can be chosen freely, depend- 
ing on what seems best for the simulation. They 
do not affect the physics, only the frequency 
with which each mass is visited. The gi can 
be fixed by choosing, for example, to visit each 
mass with equal probability, P(i) = l/N^. Then 
gi — — InZi, ie. the original free energy at fixed 
mass rrii. The choice is arbitrary, though, and 
can be optimized for speed. 

The simulation only needs gi^i—gi, and a good 
starting point is to take the first two terms below: 

Ag = -{^i:)V5m - {x)V{Smf + 0{{5mf ) 

where {^ip) and (x) are the chiral condensate 
and susceptibility. The requirement of overlap- 
ping histograms implies that Sm satisfies 

6m^l/^/{^ ^mjW. (1) 

The overhead depends on the step size Sm. As 
the susceptibility is related to the scalar meson 
mass, X = ^cr/iT^a^ the step size is large in the 
chiral limit. Hence simulated tempering becomes 
more effective for very small quark masses, with 
the gain in speed more than compensating for the 
cost of having additional masses. 

The volume dependence above is in units of 
some relevant correlation length, so taking the 
continuum limit in fixed physical volume does not 
cause the method to break down. 

A further improvement comes if you temper in 
a parallel way. If different masses are needed, 
you can happily do Nm different simulated tem- 
pering runs simultaneously on different comput- 
ers. You can even go one better, and put them 
together on one computer in a way that removes 
the need for the constants gi and improves the 
performance. This is called parallel tempering; 
results are presented in the next section. 

In parallel tempering you first generate one 
(thermalised) configuration Ci at each of the 
masses. Then use a Metropolis step to de- 
cide whether the configuration Ci at mass rrii, 
and configuration Ci+i at mass rrii-^.i should be 
swapped for the next trajectory. If this is done. 



Table 1 

Parameters and results from the tempered (T) 
and a standard HMC run. The trajectories have 
unit length. The last three columns give the ac- 
ceptance rate for the Metropolis mass changes, 
and the integrated autocorrelation times of the 
plaquette and 2x2 Wilson loop. 



m 


s/trj 


% 


'int 


W2x'2 
~int 


(T) 0.020 


475 


15 


4.9(14) 


3.6(10) 


(T) 0.024 


350 


21 


3.3(8) 


2.8(6) 


(T) 0.028 


283 


24 


4.2(15) 


3.3(10) 


(T) 0.032 


231 


21 


3.5(9) 


2.7(10) 


(T) 0.036 


193 




3.6(12) 


2.8(6) 


(HMC) 0.020 


475 




9.0(21) 


8.4(17) 


(HMC) 0.040 


166 




8.5(20) 


6.9(15) 


(HMC) 0.060 


84 




7.3(13) 


5.8(13) 



the next trajectory will run with Ci+i at mass 
rrii, and Ci at mass m.i+i. After each trajectory 
one starts trying to swap masses 1 and 2, moving 
up through the list, ending by trying to swap the 
configurations at masses Nm — 1 and A^^. 

This method doesn't need any constants gi, 
changes the mass at two rather than one config- 
uration, and has the further advantage of gener- 
ating a configuration at each mass every trajec- 
tory. Also, trajectories way out in the tail of the 
distribution stand a chance of moving more than 
one step in mass. This doesn't happen very often 
though! 

3. RESULTS 

Two sets of runs are in progress with 
four staggered fermions on an 8'^ x 12 lat- 
tice. The tempered one has five masses, 
{0.020,0.024,0.028,0.032,0.036}, called set 'T'. 
For comparison there are three standard HMC 
runs at masses 0.020, 0.040 and 0.060. The run 
parameters are given in table |^. So far about 2000 
units in r have been run for T, and about 3500 
for the HMC run. 

Which of the five configurations of the tem- 
pered run used m = 0.020 at which time can be 
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Figure 1. Part of the time history of the con- 
figuration number running at m = 0.020 in the 
tempered run. 



Figure 2. The auto-correlation function for the 
2x2 Wilson loop at m = 0.020 from set T and a 
standard HMC reference run. 



seen in the time history of figure It is clear 
that all five configurations have run at each mass 
about equally often, as required. 

The acceptance rate for transitions between 
masses is also shown in table ^, and lies 
around 20%. Another run with closer masses, 
{0.020,0.023,0.026,0.029,0.032}, yielded rates 
about ten percentage points higher. An accep- 
tance rate of around 20% to 30% seems optimal. 

A measure of the speed of an algorithm is Tj^j, 
the integrated auto-correlation |^ for an observ- 
able O. With insufficient data, A^data < lOOOrint, 
an accurate value cannot be obtained. The 
largest value for tP^ of all observables O defines 
the number of independent configurations. 

For full QCD the global topological charge Q 
seems to be the slowest observable[Q . However, 
on this size lattice the topology (field theoretic 
definition) turned out to depend on the action 
used for cooling, and is probably not well de- 
fined^. The plaquette and Wilson loops up to 
2x2 seem to have the most well defined auto- 



^The cooling actions tested used 1x1 and 1x2 loops with 
various values of the coefficients. On a given configuration 
different choices for the action lead to completely different 
global topological charges. 



correlation, and have been used here. 

In figure || the auto-correlation function for the 
2x2 Wilson loop at to = 0.020 from the tem- 
pering and a standard HMC run is plotted. The 
integrated autocorrelations obtained for both the 
plaquette and the 2x2 Wilson loop are given in 
table ^ These turn out to be about 20% smaller 
than the slope parameter needed to fit the central 
part of the correlator in figure |^ to an exponen- 
tial. 

For observables from the tempered run. Tint is 
about three times smaller than from the stan- 
dard HMC run. The computer time needed per 
tempered trajectory, Tt, compared with the time 
for a single HMC run at the smallest mass yields 
Tt = 3.22Tf^oo2. 

4. CONCLUSIONS 

Tempering yields an integrated auto- 
correlation that is about a factor of three smaller 
than the HMC run, although much better data 
is needed to make this reliable! Tempering costs 
about three times more than the single HMC run 
at the smallest mass, so it is clearly faster if more 
than one mass is required, as is usually the case! 

For realistic simulations, using improved ac- 
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tions on large, smooth lattices at very small quark 
masses, tempered methods are very likely to be 
of benefit. This is especially true for Wilson 
fermions, where many k, values are needed in or- 
der to extract meaningful physics. 
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